Barcelona's Digital Landscape: a data-driven exploration of urban dynamics around Sagrada Familia. AI-generated by our team.

Barcelona’s Digital Landscape: a data-driven exploration of urban dynamics around Sagrada Familia. AI-generated by our team.

1. Introduction

1.1 Presentation of the case

A tourism company based in Zürich, Switzerland, has observed a significant increase in travel demand to Barcelona in recent years. Indeed, Barcelona ranks as the third most in-demand city for Airbnb rentals in Europe, behind Paris and London.

Consequently, the company’s manager has requested a Machine Learning study and analysis of Airbnb accommodations in the city. The goal is to understand price behavior and identify the factors influencing accommodation costs and occupancy, enabling the company to provide optimal responses to clients’ inquiries.

To achieve this goal, the team has decided to analyse and address three question to provide comprehensive insights for the manager.

  1. What are the key factors influencing accommodation prices in Barcelona?
  2. Can we predict occupancy rates based on location, amenities, or other factors?
  3. How accurately can machine learning predict Airbnb prices in Barcelona? Which models perform best for this dataset?

1.2 Motivations

Barcelona is one of the most visited cities in Europe, and the rise of Airbnb and other short-term rental platforms has led to a notable increase in tourism. However, this growth also presents challenges for accommodation businesses and the local housing market. A study conducted by the Social Science Research Network (SSRN, link: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3428237) revealed that rental costs in neighborhoods with high Airbnb activity increased by 7% between 2009 and 2016. This is primarily due to the fact that property owners, motivated by the demand from tourists seeking short-term rentals, frequently opt to lease their properties at higher rates during the short term rather than committing to long-term leases.

For these reasons, it is crucial for tourism companies to understand this dynamic market to remain competitive and provide tailored services to their clients.

The Zürich-based tourism company needs reliable data on Airbnb prices and occupancy rates to make data-driven recommendations and stay ahead of competitors.

1.3 Disclaimer

This analysis is for educational purposes only. The findings are based on public data and are not professional advice. The results should not be used for business or policy decisions.

1.4 Dataset selected

To conduct the study, the team has decided to analyse a dataset of Barcelona Airbnbs available on the Kaggle website (link: https://www.kaggle.com/datasets/fermatsavant/airbnb-dataset-of-barcelona-city)

The dataset consists of 19.833 observations across 25 variables, including geographical zones, amenities, prices, and accommodations.

Below, we can see the structure of the dataset, and the names and data types for each column.

## Rows: 19,833
## Columns: 25
## $ X                     <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14…
## $ id                    <int> 18666, 18674, 21605, 23197, 25786, 31377, 31380,…
## $ host_id               <int> 71615, 71615, 82522, 90417, 108310, 134698, 1346…
## $ host_is_superhost     <chr> "f", "f", "f", "t", "t", "f", "f", "f", "f", "f"…
## $ host_listings_count   <int> 45, 45, 2, 5, 1, 9, 9, 41, 41, 1, NA, 3, 3, 4, 4…
## $ neighbourhood         <chr> "Sant Martí", "La Sagrada Família", "Sant Martí"…
## $ zipcode               <chr> "8026", "8025", "8018", "8930", "8012", "8025", …
## $ latitude              <dbl> 41.40889, 41.40420, 41.40560, 41.41203, 41.40145…
## $ longitude             <dbl> 2.18555, 2.17306, 2.19821, 2.22114, 2.15645, 2.1…
## $ property_type         <chr> "Apartment", "Apartment", "Apartment", "Apartmen…
## $ room_type             <chr> "Entire home/apt", "Entire home/apt", "Private r…
## $ accommodates          <int> 6, 8, 2, 6, 2, 2, 3, 4, 5, 1, 6, 2, 8, 2, 1, 1, …
## $ bathrooms             <dbl> 1.0, 2.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.5, 1.0…
## $ bedrooms              <int> 2, 3, 1, 3, 1, 1, 1, 1, 3, 1, 2, 1, 4, 1, 1, 1, …
## $ beds                  <int> 4, 6, 1, 8, 1, 2, 2, 1, 3, 1, 7, 1, 6, 1, 1, 1, …
## $ amenities             <chr> "['TV', 'Internet', 'Wifi', 'Air conditioning', …
## $ price                 <chr> "$130.00", "$60.00", "$33.00", "$210.00", "$45.0…
## $ minimum_nights        <int> 3, 1, 2, 3, 1, 3, 3, 1, 1, 29, 2, 4, 5, 2, 2, 2,…
## $ has_availability      <chr> "t", "t", "t", "t", "t", "t", "t", "t", "t", "t"…
## $ availability_30       <int> 0, 3, 4, 11, 8, 5, 3, 2, 3, 4, 2, 25, 9, 1, 3, 2…
## $ availability_60       <int> 0, 20, 8, 33, 19, 8, 8, 17, 19, 4, 16, 55, 31, 3…
## $ availability_90       <int> 0, 50, 15, 63, 41, 16, 15, 29, 31, 4, 42, 80, 61…
## $ availability_365      <int> 182, 129, 15, 318, 115, 211, 211, 266, 257, 26, …
## $ number_of_reviews_ltm <int> 0, 10, 36, 16, 49, 0, 2, 34, 15, 0, 10, 0, 24, 6…
## $ review_scores_rating  <int> 80, 87, 90, 95, 95, 95, 87, 92, 88, 99, 87, 68, …

Numerical values:

  • X: numerical index for rows
  • id: unique identifier for listings
  • host_id: unique identifier for hosts
  • host_listings_count: number of listings by the host
  • latitude: geographic latitude of the listing
  • longitude: geographic longitude of the listing
  • accommodates: number of guests the listing can accommodate
  • bathrooms: number of bathrooms in the listing
  • bedrooms: number of bedrooms in the listing
  • beds: number of beds in the listing
  • minimum_nights: minimum number of nights required for booking
  • availability_30: number of available nights in the next 30 days
  • availability_60: number of available nights in the next 60 days
  • availability_90: number of available nights in the next 90 days
  • availability_365: number of available nights in the next 365 days
  • number_of_reviews_ltm: number of reviews in the last 12 months
  • review_scores_rating: average review rating score

Binary variables:

  • host_is_superhost: indicates if the host is a superhost ("t" or "f")
  • has_availability: indicates if the listing is available for booking ("t" or "f")

String values:

  • neighbourhood: name of the neighbourhood where the listing is located
  • zipcode: postal code of the listing
  • property_type: type of property (e.g., “Apartment”)
  • room_type: type of room (e.g., “Entire home/apt”)
  • amenities: list of amenities provided in the listing
  • price: price of the listing as a string (e.g., “$130.00”)

As can be appreciated, the variable ‘price’ has a ‘Character’ data type. Therefore, in the chapter 3, this field will be transformed into an integer variable to enable the necessary calculations.

1.5 Sub-sampling

In order to streamline the calculations and analysis, a sub-dataset will be created in the following steps, considering 10.000 observations selected randomly. Additionally, a seed is created to ensure the same observations are maintained throughout the analysis

## Rows: 10,000
## Columns: 25
## $ X                     <int> 16886, 3429, 3695, 3051, 11158, 8191, 18373, 172…
## $ id                    <int> 34191752, 6787210, 7555948, 5767967, 24448300, 1…
## $ host_id               <int> 224372816, 15681396, 3911721, 2151490, 163379623…
## $ host_is_superhost     <chr> "t", "f", "f", "f", "f", "f", "f", "f", "f", "t"…
## $ host_listings_count   <int> 9, 6, 39, 6, 109, 1, 0, 16, 32, 2, 32, 2, 91, 1,…
## $ neighbourhood         <chr> "Ciutat Vella", "La Verneda i La Pau", "Camp d'e…
## $ zipcode               <chr> "8001", "8020", "8025", "8001", "8014", "8041", …
## $ latitude              <dbl> 41.37876, 41.42130, 41.40366, 41.38471, 41.37340…
## $ longitude             <dbl> 2.16882, 2.20321, 2.17096, 2.16538, 2.14020, 2.1…
## $ property_type         <chr> "Apartment", "Apartment", "Apartment", "Apartmen…
## $ room_type             <chr> "Entire home/apt", "Private room", "Entire home/…
## $ accommodates          <int> 5, 5, 6, 16, 4, 4, 1, 2, 2, 4, 4, 1, 1, 2, 2, 2,…
## $ bathrooms             <dbl> 1.0, 1.0, 1.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.5…
## $ bedrooms              <int> 2, 2, 2, 7, 2, 1, 1, 0, 1, 1, 2, 3, 1, 1, 1, 1, …
## $ beds                  <int> 5, 5, 5, 13, 2, 2, 1, 1, 2, 2, 3, 1, 1, 1, 1, 2,…
## $ amenities             <chr> "['TV', 'Cable TV', 'Wifi', 'Air conditioning', …
## $ price                 <chr> "$105.00", "$25.00", "$85.00", "$899.00", "$83.0…
## $ minimum_nights        <int> 32, 1, 1, 2, 1, 2, 1, 1, 3, 1, 3, 30, 31, 3, 3, …
## $ has_availability      <chr> "t", "t", "t", "t", "t", "t", "t", "t", "t", "t"…
## $ availability_30       <int> 9, 18, 21, 3, 2, 0, 10, 17, 19, 17, 15, 9, 9, 27…
## $ availability_60       <int> 39, 48, 51, 24, 16, 0, 30, 47, 23, 47, 45, 39, 3…
## $ availability_90       <int> 40, 78, 71, 47, 26, 0, 60, 77, 42, 59, 75, 69, 6…
## $ availability_365      <int> 40, 353, 327, 241, 297, 0, 335, 352, 127, 64, 16…
## $ number_of_reviews_ltm <int> 0, 12, 2, 7, 0, 0, 2, 4, 9, 21, 2, 0, 0, 0, 13, …
## $ review_scores_rating  <int> NA, 83, 90, 92, NA, 96, 100, 90, 86, 97, 90, 95,…

2. Methodology

To address the research question, the study will be divided into three parts. First, an Exploratory Data Analysis (EDA) will be conducted to gain a deeper understanding of the data. Second, Machine Learning models will be implemented, and their performance will be evaluated to identify the best-performing model. Finally, the selected model will be used to provide the most accurate answer to the research question posed by the team.

The different models to be developed are:

  1. Linear Model (FB)
  2. Generalised Linear Model with family set to Poisson for binary data (FB)
  3. Generalised Linear Model with family set to Binomial for binomial and multinomial data (MD)
  4. Generalised Additive Model for (CR)
  5. Neural Network for (CR)
  6. Support Vector Machine for (MD)

3 Exploratory Data Analysis

3.1 Converting Price variable to numeric

First, the pricing variable will be converted into a numeric format, and in the fifth chapter of this report (Machine Learning Models), the categorical variables will be transformed into factors for further analysis and modeling.

BCN_Accomm_sub$price <- gsub(",", "", BCN_Accomm_sub$price) # removed ','
BCN_Accomm_sub$price <- gsub("\\$", "", BCN_Accomm_sub$price) # removed '$' sign
BCN_Accomm_sub$price <- as.numeric(BCN_Accomm_sub$price)  # converted to number format

3.2 Missing values (detect and treat) (MD)

##                     X                    id               host_id 
##                     0                     0                     0 
##     host_is_superhost   host_listings_count         neighbourhood 
##                     0                    22                     0 
##               zipcode              latitude             longitude 
##                     0                     0                     0 
##         property_type             room_type          accommodates 
##                     0                     0                     0 
##             bathrooms              bedrooms                  beds 
##                     6                     2                    18 
##             amenities                 price        minimum_nights 
##                     0                     0                     0 
##      has_availability       availability_30       availability_60 
##                     0                     0                     0 
##       availability_90      availability_365 number_of_reviews_ltm 
##                     0                     0                     0 
##  review_scores_rating 
##                  2415

##      X id host_id host_is_superhost neighbourhood zipcode latitude longitude
## 7552 1  1       1                 1             1       1        1         1
## 2401 1  1       1                 1             1       1        1         1
## 17   1  1       1                 1             1       1        1         1
## 5    1  1       1                 1             1       1        1         1
## 9    1  1       1                 1             1       1        1         1
## 8    1  1       1                 1             1       1        1         1
## 4    1  1       1                 1             1       1        1         1
## 1    1  1       1                 1             1       1        1         1
## 1    1  1       1                 1             1       1        1         1
## 2    1  1       1                 1             1       1        1         1
##      0  0       0                 0             0       0        0         0
##      property_type room_type accommodates amenities price minimum_nights
## 7552             1         1            1         1     1              1
## 2401             1         1            1         1     1              1
## 17               1         1            1         1     1              1
## 5                1         1            1         1     1              1
## 9                1         1            1         1     1              1
## 8                1         1            1         1     1              1
## 4                1         1            1         1     1              1
## 1                1         1            1         1     1              1
## 1                1         1            1         1     1              1
## 2                1         1            1         1     1              1
##                  0         0            0         0     0              0
##      has_availability availability_30 availability_60 availability_90
## 7552                1               1               1               1
## 2401                1               1               1               1
## 17                  1               1               1               1
## 5                   1               1               1               1
## 9                   1               1               1               1
## 8                   1               1               1               1
## 4                   1               1               1               1
## 1                   1               1               1               1
## 1                   1               1               1               1
## 2                   1               1               1               1
##                     0               0               0               0
##      availability_365 number_of_reviews_ltm bedrooms bathrooms beds
## 7552                1                     1        1         1    1
## 2401                1                     1        1         1    1
## 17                  1                     1        1         1    1
## 5                   1                     1        1         1    1
## 9                   1                     1        1         1    0
## 8                   1                     1        1         1    0
## 4                   1                     1        1         0    1
## 1                   1                     1        1         0    1
## 1                   1                     1        1         0    0
## 2                   1                     1        0         1    1
##                     0                     0        2         6   18
##      host_listings_count review_scores_rating     
## 7552                   1                    1    0
## 2401                   1                    0    1
## 17                     0                    1    1
## 5                      0                    0    2
## 9                      1                    1    1
## 8                      1                    0    2
## 4                      1                    1    1
## 1                      1                    0    2
## 1                      1                    1    2
## 2                      1                    1    1
##                       22                 2415 2463
Missing Values by Variable
Missing_Count Missing_Percent
review_scores_rating 2415 24.15%
host_listings_count 22 0.22%
beds 18 0.18%
bathrooms 6 0.06%
bedrooms 2 0.02%
X 0 0%
id 0 0%
host_id 0 0%
host_is_superhost 0 0%
neighbourhood 0 0%
zipcode 0 0%
latitude 0 0%
longitude 0 0%
property_type 0 0%
room_type 0 0%
accommodates 0 0%
amenities 0 0%
price 0 0%
minimum_nights 0 0%
has_availability 0 0%
availability_30 0 0%
availability_60 0 0%
availability_90 0 0%
availability_365 0 0%
number_of_reviews_ltm 0 0%

3.2.1 How to manage Missing values for each field (MD)

host_listings_count : since is not possible to make any calculation on the number of listing of the host, we exclude the 22 rows that lack of it. bathrooms : the number of bathrooms is missing in 6 rows. beds : the number of beds is not specified for 18 assets. bedrooms : 2 rows contains missing value and can be deleted. review_scores_rating : the review score rating is missing in 2415 rows of 10000. It’s a quite relevant percentage, around the 24% of the data we selected. In this case we decide to impute the missing values replacing it with the value 0.

##      X id host_id host_is_superhost neighbourhood zipcode latitude longitude
## 9953 1  1       1                 1             1       1        1         1
## 22   1  1       1                 1             1       1        1         1
## 17   1  1       1                 1             1       1        1         1
## 5    1  1       1                 1             1       1        1         1
## 1    1  1       1                 1             1       1        1         1
## 2    1  1       1                 1             1       1        1         1
##      0  0       0                 0             0       0        0         0
##      property_type room_type accommodates amenities price minimum_nights
## 9953             1         1            1         1     1              1
## 22               1         1            1         1     1              1
## 17               1         1            1         1     1              1
## 5                1         1            1         1     1              1
## 1                1         1            1         1     1              1
## 2                1         1            1         1     1              1
##                  0         0            0         0     0              0
##      has_availability availability_30 availability_60 availability_90
## 9953                1               1               1               1
## 22                  1               1               1               1
## 17                  1               1               1               1
## 5                   1               1               1               1
## 1                   1               1               1               1
## 2                   1               1               1               1
##                     0               0               0               0
##      availability_365 number_of_reviews_ltm review_scores_rating bedrooms
## 9953                1                     1                    1        1
## 22                  1                     1                    1        1
## 17                  1                     1                    1        1
## 5                   1                     1                    1        1
## 1                   1                     1                    1        1
## 2                   1                     1                    1        0
##                     0                     0                    0        2
##      bathrooms beds host_listings_count   
## 9953         1    1                   1  0
## 22           1    1                   0  1
## 17           1    0                   1  1
## 5            0    1                   1  1
## 1            0    0                   1  2
## 2            1    1                   1  1
##              6   18                  22 48

Eventually, the other fields that present missing values do not allow to replace the empty data with estimates (average, mean, …) so called imputation. This fields include in total 48 assets that represent 0.48% of the total assets and allows to delete the entire rows without loosing too many information.

3.3 Correlation matrix

# Create a correlation matrix for numeric fields
cor_BNC_Accomm <- select_if(BCN_Accomm, is.numeric) %>%
  select(-c(id, X, host_id))

# make a data frame
cor_BNC_Accomm <- data.frame(cor_BNC_Accomm)
str(cor_BNC_Accomm)
## 'data.frame':    9953 obs. of  15 variables:
##  $ host_listings_count  : int  9 6 39 6 109 1 0 16 32 2 ...
##  $ latitude             : num  41.4 41.4 41.4 41.4 41.4 ...
##  $ longitude            : num  2.17 2.2 2.17 2.17 2.14 ...
##  $ accommodates         : int  5 5 6 16 4 4 1 2 2 4 ...
##  $ bathrooms            : num  1 1 1 6 1 1 1 1 1 1.5 ...
##  $ bedrooms             : int  2 2 2 7 2 1 1 0 1 1 ...
##  $ beds                 : int  5 5 5 13 2 2 1 1 2 2 ...
##  $ price                : num  105 25 85 899 83 45 29 50 165 70 ...
##  $ minimum_nights       : int  32 1 1 2 1 2 1 1 3 1 ...
##  $ availability_30      : int  9 18 21 3 2 0 10 17 19 17 ...
##  $ availability_60      : int  39 48 51 24 16 0 30 47 23 47 ...
##  $ availability_90      : int  40 78 71 47 26 0 60 77 42 59 ...
##  $ availability_365     : int  40 353 327 241 297 0 335 352 127 64 ...
##  $ number_of_reviews_ltm: int  0 12 2 7 0 0 2 4 9 21 ...
##  $ review_scores_rating : num  0 83 90 92 0 96 100 90 86 97 ...
# print correlation matrix
corrplot(cor(cor_BNC_Accomm), type = "upper", order = "hclust", tl.col = "black")

From the correlation matrix is possible to deduct the following characteristics: - there’s almost no correlation between availability periods and number of beds, bedrooms, bathrooms. It would suggest that the availability of the house do not depend from those features, rather probably from the location and facilities. - there is a positive correlation between number of bedrooms, beds and bathrooms. - there is a positive correlation between the availability periods.

3.4 Outliers (boxplots) (detect and treat) (CR)

Since the price variable is a key focus of our analysis, an outlier analysis of this variable has been conducted.

3.4.1 Count of outliers in Price

## [1] 804

Out of a total of 10,000 values, 804 (8.04%) are identified as outliers. Below, a boxplot is presented to visualize the median and the outlier observations.

From the boxplot above, it can be concluded that the median price is approximately 65€ per night, with 50% of the observations concentrated between 40€ (25th percentile) and 112€ (75th percentile), representing the interquartile range (IQR).

Additionally, the presence of numerous outliers extending to the right indicates a right-skewed distribution, meaning higher prices are influencing the dataset.

The significant number of observations with higher prices could suggest the presence of many luxury properties. Therefore, further analysis is required to identify the factors influencing these price variations.

3.4.2 Detecting Top 20 Price Outliers

Table 2: Top 20 and Outliers for Price Variable with Neighbourhood
neighbourhood property_type bedrooms price
Gràcia Bed and breakfast 1 8000
Sants-Montjuïc Boat 4 8000
Vila de Gràcia Bed and breakfast 1 8000
Vila de Gràcia Bed and breakfast 1 8000
Vila de Gràcia Bed and breakfast 1 8000
Vila de Gràcia Bed and breakfast 1 8000
Eixample Boutique hotel 1 6000
Eixample Hotel 1 6000
Eixample Hotel 1 6000
Eixample Hotel 1 6000
Eixample Hotel 1 6000
Eixample Hotel 1 6000
La Nova Esquerra de l’Eixample Hotel 1 6000
La Nova Esquerra de l’Eixample Hotel 1 6000
La Nova Esquerra de l’Eixample Hotel 1 6000
La Nova Esquerra de l’Eixample Hotel 1 6000
Sant Antoni Boutique hotel 1 6000
Sant Antoni Boutique hotel 1 6000
Sant Antoni Hotel 1 6000
Sant Antoni Hotel 1 6000

It seems that the price variable may contain erroneous entries. For further analysis, research revealed that the average nightly rate for an Airbnb in Barcelona is €93 (according to Hostel Geeks, link: https://hostelgeeks.com/best-airbnbs-in-barcelona-spain/). Therefore, prices of €8,000 are likely errors. As a result, it was decided to exclude prices above €1,000 from the analysis.

The new summary for tha Price variable is the following:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.00   40.00   65.00   96.93  110.00 1000.00

Now, a new Correlation Matrix with the filtered observations of the price variable is displayed.

## 'data.frame':    9953 obs. of  15 variables:
##  $ host_listings_count  : int  9 6 39 6 109 1 0 16 32 2 ...
##  $ latitude             : num  41.4 41.4 41.4 41.4 41.4 ...
##  $ longitude            : num  2.17 2.2 2.17 2.17 2.14 ...
##  $ accommodates         : int  5 5 6 16 4 4 1 2 2 4 ...
##  $ bathrooms            : num  1 1 1 6 1 1 1 1 1 1.5 ...
##  $ bedrooms             : int  2 2 2 7 2 1 1 0 1 1 ...
##  $ beds                 : int  5 5 5 13 2 2 1 1 2 2 ...
##  $ price                : num  105 25 85 899 83 45 29 50 165 70 ...
##  $ minimum_nights       : int  32 1 1 2 1 2 1 1 3 1 ...
##  $ availability_30      : int  9 18 21 3 2 0 10 17 19 17 ...
##  $ availability_60      : int  39 48 51 24 16 0 30 47 23 47 ...
##  $ availability_90      : int  40 78 71 47 26 0 60 77 42 59 ...
##  $ availability_365     : int  40 353 327 241 297 0 335 352 127 64 ...
##  $ number_of_reviews_ltm: int  0 12 2 7 0 0 2 4 9 21 ...
##  $ review_scores_rating : num  0 83 90 92 0 96 100 90 86 97 ...

We can observe that the price variable is now most correlated with variables related to the size and capacity of an Airbnb, such as bathrooms, bedrooms, number of beds, and accommodates.

On the other hand, general availability and review scores have little impact on the price.

3.5 Histograms for Numerical Variables(CR)

From the histograms above, several variables exhibit right-skewed distributions, including price, minimum_nights, and number_of_reviews_ltm.

On the other hand, the data suggests that in Barcelona, Airbnb listings are primarily designed for small groups of people seeking short-term stays. Additionally, these accommodations tend to receive high review scores, indicating good guest satisfaction with the different properties.

3.4 Pie Charts for Binary Variable “Host_is_superhost”

Almost 19% of the hosts offering an Airbnb in Barcelona are not categorized as Superhosts. This means tourists can find accommodations in the city where hosts go above and beyond to provide excellent hospitality. This insight could be a key factor in explaining the higher price values observed in certain neighborhoods.

3.5 Plots for String Variables

This section presents a variety of plots for the categorical variables, including Property Type, Room Type, Top Neighbourhoods, and Amenities.

From the plot above, it can be observed that apartments dominate the Airbnb market in Barcelona, accounting for 86% of the listings.

On the other hand, the low availability of luxury or specialized accommodations, such as Boutique Hotels (0.5%), Guest Suites (0.7%), and Lofts (2.4%), suggests that these property types cater to a niche market. Travelers opting for these accommodations are likely visiting Barcelona for specific reasons, such as work or unique travel experiences.

The majority of Room Type are split between Entire home/Apartment and Private Room.

Less than 1% of the hosts offer Shared Room, which suggests that travellers prefer more privacy during the stay.

The Eixample district of Barcelona represents the most popular neighbourhood on Airbnb, with 27% of the total listings. This is followed by Ciutat Vella, which accounts for 18.8% of the listings.

Eixample is situated in close proximity to the historic centre of the city and is more centrally located in comparison to other neighbourhoods. The area offers many attractions for tourists, including La Sagrada Familia, Casa Batlló, and Passeig de Gràcia. In addition to its excellent transport connections, Eixample is an ideal destination for visitors.

On the other hand, Ciutat Vella is the oldest part of Barcelona and serves as the heart of the city, known for its historical charm and vibrant cultural scene.

Given this, the Tourism Company in Zürich could recommend that its clients focus on these neighbourhoods to attract more customers and enhance their travel experience.

Worldcloud for Amaneties

The Wordcloud above, provides the most common amenities offered by the different hosts.

The most prominent amenities are: Kitchen, Wifi, Heating, Washer and Hair dryer. his can be taken to indicate that tourists may consider a place to be comfortable for their stay if it meets these basic requirements.

3.5 Summary Statistics (FB)

4 Visualization Insights

This section was designed to allow the employees of the Tourism Company and other Users to interact with the data on neighborhoods, prices, and reviews.

4.1 Airbnb locations by neighbourhood with Interactive panel

The purpose of the following interactive plot is to allow users to select a neighborhood of interest and visualize, on a map, the different accommodations available along with their price per night when one of the circles is clicked.

4.2 Heatmap of prices

In the heatmap below, users can observe the zones with higher accommodation prices (red/orange areas).

In contrast, the zones colored in green or blue represent lower-priced neighborhoods.

According to the heatmap, the Tourism Company can recommend the red zones to tourists looking for more centralized accommodations, regardless of price. On the other hand, tourists who want to save money can be advised to choose accommodations in the green or blue areas, which are typically farther from the city center.

4.3 Review Score Rating vs. Price by Room Type

From the plot above, the following insights can be derived: - The majority of listings are concentrated at the lower price range (below 250 Euros), irrespective of room type. - Accommodations with high review scores (exceeding 90 points) are distributed across all price categories, indicating that well-reviewed Airbnbs are not restricted to a particular room type or price range.

5 Machine Learning Models

In this chapter, different machine learning models will be explored to predict Airbnb prices and the occupancy rate over the next 30 days.

The formula to calculate the Occupancy rate in 30 days is:

Occupancy Rate: \[ \text{Occupancy Rate} = \left(1 - \frac{\text{Availability 30 Days}}{\text{Total Days = 30}}\right) \times 100 \]

According to the formula above, a new columns with the Occupancy rate is calculated and the head data of the new variable Occupancy_rate_30 is:

## [1]  70.00000  40.00000  30.00000  90.00000  93.33333 100.00000

With this new predictor, the Occupancy rate in the next 30 days is going to be predicted.

As mentioned in the previous chapters, the categorical variables are converted into factors to proceed with the modeling phase.

Train and Test data

Before analysing the different models, we need to divide the data into a training set and test set. The first set will be used to find the relationship between dependent and independent variable, while the second set will be used to analyse the performance of the models. We decide to use 60% of the data set as a training set, and the rest as a test set.

Variables for Pricing Model

Next, based on the Correlation Matrix, showed in chapter 3.4, the variables used to address the first reasearch question about price, are:

  • bedrroms
  • bathrooms
  • accomodates
  • beds
  • latitude and longitude
  • review_score_rating
  • minimum_nights
  • property_type
  • room_type
  • neighbourhood

Variables for occupancy Rate Model

The variables used for the Occupancy rate in one month are:

  • latitude and longitude (location)
  • bathrooms
  • bedrooms
  • accommodates
  • beds
  • price
  • minimum_nights
  • review_score_rating
  • neighbourhood

5.1 Linear Model

It is used to analyse …. and answer the project question x - Accurancy - Precision - Recall - RMSE (Root Mean Squared Error) - MAE (Mean Absolute Error) - R Squared

5.2 Generalised Linear Model with family set to Poisson

It is used to analyse …. and answer the project question x - Accurancy - Precision - Recall - RMSE (Root Mean Squared Error) - MAE (Mean Absolute Error) - R Squared

5.3 Generalised Linear Model with family set to Binomial or Multinomial data

  1. What are the key factors influencing accommodation prices in Barcelona?
  2. Can we predict occupancy rates based on location, amenities, or other factors?

5.4 Generalised Additive Model

In this chapter, Generalized Additive Models (GAMs) will be applied with the Price variable as the response, to analyze its interactions with the predictor variables.

The first goal is to identify key factors influencing prices in Barcelona, addressing Research Question 1 (What are the key factors influencing accommodation prices in Barcelona?)

First, we aim to determine whether a nonlinear relationship exists between the independent variables and price. To explore this, the variable Review Score Rating will be plotted against Price to visualize whether the relationship is linear or not.

## `geom_smooth()` using formula = 'y ~ x'

From the plot above and Chapter 4.3 of this report, we observe that many points are concentrated on the right side, where higher review scores are paired with lower prices. This suggests a lack of a strong relationship between Review Score Ratings and Price.

Given that at least one variable does not exhibit a linear relationship, we will proceed with applying a Generalized Additive Model (GAM) to better capture potential nonlinear interactions.

5.4.1 GAM Model Training for Price Prediction

The GAM model is performed using the training data.

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## price ~ s(bathrooms) + s(bedrooms) + s(accommodates) + s(beds) + 
##     s(latitude) + s(longitude) + s(review_scores_rating) + s(minimum_nights) + 
##     room_type + neighbourhood
## 
## Parametric coefficients:
##                                              Estimate Std. Error t value
## (Intercept)                                    287.59      99.68   2.885
## room_typePrivate room                          -46.48       4.57 -10.169
## room_typeShared room                           -68.94      14.23  -4.844
## neighbourhoodCamp d'en Grassot i Gràcia Nova  -148.40     100.97  -1.470
## neighbourhoodCan Baro                         -103.47     105.54  -0.980
## neighbourhoodCarmel                            -94.38     103.32  -0.913
## neighbourhoodCiutat Vella                     -183.32     100.32  -1.827
## neighbourhoodDiagonal Mar - La Mar Bella      -129.25     102.61  -1.260
## neighbourhoodDreta de l'Eixample              -150.31      99.98  -1.503
## neighbourhoodEixample                         -166.24      99.83  -1.665
## neighbourhoodEl Baix Guinardó                 -153.27     102.44  -1.496
## neighbourhoodEl Besòs i el Maresme            -169.69     103.04  -1.647
## neighbourhoodEl Bon Pastor                    -159.11     113.36  -1.404
## neighbourhoodEl Born                          -179.28     101.33  -1.769
## neighbourhoodEl Camp de l'Arpa del Clot       -155.39     101.04  -1.538
## neighbourhoodEl Clot                          -148.60     102.47  -1.450
## neighbourhoodEl Coll                          -129.06     118.84  -1.086
## neighbourhoodEl Congrés i els Indians         -155.55     107.34  -1.449
## neighbourhoodel Fort Pienc                    -176.53     100.62  -1.754
## neighbourhoodEl Gòtic                         -191.99     100.63  -1.908
## neighbourhoodEl Poble-sec                     -183.61     100.74  -1.823
## neighbourhoodEl Poblenou                      -144.92     101.55  -1.427
## neighbourhoodEl Putget i Farró                -101.97     101.40  -1.006
## neighbourhoodEl Raval                         -185.13     100.45  -1.843
## neighbourhoodGlòries - El Parc                -175.09     101.37  -1.727
## neighbourhoodGràcia                           -149.17      99.85  -1.494
## neighbourhoodGuinardó                         -139.29     101.13  -1.377
## neighbourhoodHorta                             -92.63     135.31  -0.685
## neighbourhoodHorta-Guinardó                   -121.90      99.70  -1.223
## neighbourhoodL'Antiga Esquerra de l'Eixample  -156.20     100.18  -1.559
## neighbourhoodLa Barceloneta                   -162.37     101.36  -1.602
## neighbourhoodLa Font d'en Fargues             -148.44     112.25  -1.322
## neighbourhoodLa Maternitat i Sant Ramon       -205.57      99.97  -2.056
## neighbourhoodLa Nova Esquerra de l'Eixample   -165.07     100.30  -1.646
## neighbourhoodLa Prosperitat                    -79.77     137.10  -0.582
## neighbourhoodLa Sagrada Família               -170.82     100.21  -1.705
## neighbourhoodLa Sagrera                       -117.18     104.13  -1.125
## neighbourhoodLa Salut                         -133.19     102.32  -1.302
## neighbourhoodLa Teixonera                     -144.49     112.34  -1.286
## neighbourhoodLa Trinitat Vella                -183.15     120.73  -1.517
## neighbourhoodLa Verneda i La Pau              -155.21     105.30  -1.474
## neighbourhoodLa Vila Olímpica                 -138.96     102.00  -1.362
## neighbourhoodLes Corts                        -192.97      99.50  -1.939
## neighbourhoodLes Tres Torres                  -183.02     107.64  -1.700
## neighbourhoodMontbau                          -106.58     135.05  -0.789
## neighbourhoodNavas                            -151.56     103.03  -1.471
## neighbourhoodNou Barris                       -129.69     100.50  -1.291
## neighbourhoodPedralbes                        -214.17     117.40  -1.824
## neighbourhoodPorta                            -134.03     109.63  -1.223
## neighbourhoodProvençals del Poblenou          -162.56     103.96  -1.564
## neighbourhoodSant Andreu                      -127.31     100.18  -1.271
## neighbourhoodSant Andreu de Palomar           -128.94     103.53  -1.245
## neighbourhoodSant Antoni                      -179.28     100.42  -1.785
## neighbourhoodSant Genís dels Agudells         -133.29     112.16  -1.188
## neighbourhoodSant Gervasi - Galvany           -170.18     100.55  -1.693
## neighbourhoodSant Gervasi - la Bonanova       -168.19     119.00  -1.413
## neighbourhoodSant Martí                       -152.57     100.43  -1.519
## neighbourhoodSant Martí de Provençals         -160.63     104.23  -1.541
## neighbourhoodSant Pere/Santa Caterina         -180.09     100.60  -1.790
## neighbourhoodSants-Montjuïc                   -182.58     100.12  -1.824
## neighbourhoodSarrià                           -177.41      97.98  -1.811
## neighbourhoodSarrià-Sant Gervasi              -153.96      99.83  -1.542
## neighbourhoodTrinitat Nova                    -146.10     136.65  -1.069
## neighbourhoodTuró de la Peira - Can Peguera   -112.90     105.94  -1.066
## neighbourhoodVallcarca i els Penitents         -78.76     103.08  -0.764
## neighbourhoodVerdum - Los Roquetes            -138.45     110.82  -1.249
## neighbourhoodVila de Gràcia                   -107.88     100.10  -1.078
## neighbourhoodVilapicina i la Torre Llobeta    -158.61     109.05  -1.455
##                                              Pr(>|t|)    
## (Intercept)                                   0.00393 ** 
## room_typePrivate room                         < 2e-16 ***
## room_typeShared room                         1.31e-06 ***
## neighbourhoodCamp d'en Grassot i Gràcia Nova  0.14173    
## neighbourhoodCan Baro                         0.32696    
## neighbourhoodCarmel                           0.36105    
## neighbourhoodCiutat Vella                     0.06770 .  
## neighbourhoodDiagonal Mar - La Mar Bella      0.20788    
## neighbourhoodDreta de l'Eixample              0.13280    
## neighbourhoodEixample                         0.09594 .  
## neighbourhoodEl Baix Guinardó                 0.13464    
## neighbourhoodEl Besòs i el Maresme            0.09966 .  
## neighbourhoodEl Bon Pastor                    0.16051    
## neighbourhoodEl Born                          0.07692 .  
## neighbourhoodEl Camp de l'Arpa del Clot       0.12414    
## neighbourhoodEl Clot                          0.14705    
## neighbourhoodEl Coll                          0.27755    
## neighbourhoodEl Congrés i els Indians         0.14739    
## neighbourhoodel Fort Pienc                    0.07944 .  
## neighbourhoodEl Gòtic                         0.05647 .  
## neighbourhoodEl Poble-sec                     0.06842 .  
## neighbourhoodEl Poblenou                      0.15361    
## neighbourhoodEl Putget i Farró                0.31463    
## neighbourhoodEl Raval                         0.06539 .  
## neighbourhoodGlòries - El Parc                0.08419 .  
## neighbourhoodGràcia                           0.13523    
## neighbourhoodGuinardó                         0.16849    
## neighbourhoodHorta                            0.49366    
## neighbourhoodHorta-Guinardó                   0.22151    
## neighbourhoodL'Antiga Esquerra de l'Eixample  0.11901    
## neighbourhoodLa Barceloneta                   0.10927    
## neighbourhoodLa Font d'en Fargues             0.18612    
## neighbourhoodLa Maternitat i Sant Ramon       0.03981 *  
## neighbourhoodLa Nova Esquerra de l'Eixample   0.09986 .  
## neighbourhoodLa Prosperitat                   0.56069    
## neighbourhoodLa Sagrada Família               0.08831 .  
## neighbourhoodLa Sagrera                       0.26053    
## neighbourhoodLa Salut                         0.19309    
## neighbourhoodLa Teixonera                     0.19843    
## neighbourhoodLa Trinitat Vella                0.12931    
## neighbourhoodLa Verneda i La Pau              0.14054    
## neighbourhoodLa Vila Olímpica                 0.17316    
## neighbourhoodLes Corts                        0.05251 .  
## neighbourhoodLes Tres Torres                  0.08915 .  
## neighbourhoodMontbau                          0.43001    
## neighbourhoodNavas                            0.14137    
## neighbourhoodNou Barris                       0.19693    
## neighbourhoodPedralbes                        0.06817 .  
## neighbourhoodPorta                            0.22154    
## neighbourhoodProvençals del Poblenou          0.11796    
## neighbourhoodSant Andreu                      0.20387    
## neighbourhoodSant Andreu de Palomar           0.21305    
## neighbourhoodSant Antoni                      0.07426 .  
## neighbourhoodSant Genís dels Agudells         0.23472    
## neighbourhoodSant Gervasi - Galvany           0.09060 .  
## neighbourhoodSant Gervasi - la Bonanova       0.15761    
## neighbourhoodSant Martí                       0.12876    
## neighbourhoodSant Martí de Provençals         0.12335    
## neighbourhoodSant Pere/Santa Caterina         0.07350 .  
## neighbourhoodSants-Montjuïc                   0.06828 .  
## neighbourhoodSarrià                           0.07027 .  
## neighbourhoodSarrià-Sant Gervasi              0.12307    
## neighbourhoodTrinitat Nova                    0.28507    
## neighbourhoodTuró de la Peira - Can Peguera   0.28659    
## neighbourhoodVallcarca i els Penitents        0.44488    
## neighbourhoodVerdum - Los Roquetes            0.21160    
## neighbourhoodVila de Gràcia                   0.28117    
## neighbourhoodVilapicina i la Torre Llobeta    0.14587    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                           edf Ref.df      F  p-value    
## s(bathrooms)            4.974  6.008  8.753  < 2e-16 ***
## s(bedrooms)             4.372  5.383  3.148  0.00699 ** 
## s(accommodates)         5.345  6.276  9.863  < 2e-16 ***
## s(beds)                 1.804  2.296  2.179  0.10021    
## s(latitude)             6.629  7.860  8.555  < 2e-16 ***
## s(longitude)            6.672  7.900  4.342 3.37e-05 ***
## s(review_scores_rating) 4.581  5.497 12.340  < 2e-16 ***
## s(minimum_nights)       3.726  4.438 37.384  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.322   Deviance explained = 33.6%
## -REML =  28894  Scale est. = 8414.4    n = 4913

5.4.2 Gam Model Prediction of Pricing evaluation

## R-squared on Test Set:  0.338637

5.4.3 GAM Model: Evaluating Predictive Pricing Performance

## MAE:  46.30068
## RMSE:  87.05353
## R-squared:  0.338637

5.4.4 Gam Model Training for Occupancy Prediction

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## occupancy_rate_30 ~ s(latitude) + s(longitude) + s(bathrooms) + 
##     s(bedrooms) + s(accommodates) + s(beds) + s(price) + s(minimum_nights) + 
##     s(review_scores_rating) + (neighbourhood)
## 
## Parametric coefficients:
##                                              Estimate Std. Error t value
## (Intercept)                                   93.9828    29.7302   3.161
## neighbourhoodCamp d'en Grassot i Gràcia Nova -21.6244    30.0530  -0.720
## neighbourhoodCan Baro                        -57.3858    31.5162  -1.821
## neighbourhoodCarmel                          -28.6511    30.9223  -0.927
## neighbourhoodCiutat Vella                    -21.7160    29.9352  -0.725
## neighbourhoodDiagonal Mar - La Mar Bella     -22.8594    30.5217  -0.749
## neighbourhoodDreta de l'Eixample             -23.5350    29.8054  -0.790
## neighbourhoodEixample                        -20.8334    29.7577  -0.700
## neighbourhoodEl Baix Guinardó                -22.6073    30.4980  -0.741
## neighbourhoodEl Besòs i el Maresme           -26.9060    30.6085  -0.879
## neighbourhoodEl Bon Pastor                     0.7423    34.0760   0.022
## neighbourhoodEl Born                         -12.2346    30.2609  -0.404
## neighbourhoodEl Camp de l'Arpa del Clot      -20.9535    30.0562  -0.697
## neighbourhoodEl Clot                         -29.5142    30.5036  -0.968
## neighbourhoodEl Coll                         -54.0424    35.8055  -1.509
## neighbourhoodEl Congrés i els Indians        -14.3995    32.2823  -0.446
## neighbourhoodel Fort Pienc                   -17.4830    30.0132  -0.583
## neighbourhoodEl Gòtic                        -19.6208    30.0208  -0.654
## neighbourhoodEl Poble-sec                    -20.4614    30.0859  -0.680
## neighbourhoodEl Poblenou                     -11.5547    30.2098  -0.382
## neighbourhoodEl Putget i Farró               -24.9957    30.0892  -0.831
## neighbourhoodEl Raval                        -21.0668    29.9798  -0.703
## neighbourhoodGlòries - El Parc               -19.9000    30.2422  -0.658
## neighbourhoodGràcia                          -18.6348    29.6868  -0.628
## neighbourhoodGuinardó                        -21.6961    30.1835  -0.719
## neighbourhoodHorta                           -50.2236    41.1562  -1.220
## neighbourhoodHorta-Guinardó                  -25.7994    29.7692  -0.867
## neighbourhoodL'Antiga Esquerra de l'Eixample -24.4582    29.8252  -0.820
## neighbourhoodLa Barceloneta                  -16.4498    30.2493  -0.544
## neighbourhoodLa Font d'en Fargues            -27.3096    33.8886  -0.806
## neighbourhoodLa Maternitat i Sant Ramon      -21.4781    30.0036  -0.716
## neighbourhoodLa Nova Esquerra de l'Eixample  -23.2732    29.9045  -0.778
## neighbourhoodLa Prosperitat                  -81.0111    41.6938  -1.943
## neighbourhoodLa Sagrada Família              -22.1307    29.8259  -0.742
## neighbourhoodLa Sagrera                      -23.8149    31.1802  -0.764
## neighbourhoodLa Salut                         -7.2049    30.4009  -0.237
## neighbourhoodLa Teixonera                    -27.2127    33.8132  -0.805
## neighbourhoodLa Trinitat Vella               -34.8536    35.9411  -0.970
## neighbourhoodLa Verneda i La Pau             -28.5639    31.5070  -0.907
## neighbourhoodLa Vila Olímpica                -21.1895    30.3786  -0.698
## neighbourhoodLes Corts                       -21.8761    29.6501  -0.738
## neighbourhoodLes Tres Torres                 -32.0797    32.0809  -1.000
## neighbourhoodMontbau                         -23.5499    41.0902  -0.573
## neighbourhoodNavas                           -17.8430    30.7322  -0.581
## neighbourhoodNou Barris                      -29.3390    29.9159  -0.981
## neighbourhoodPedralbes                       -29.5077    35.5738  -0.829
## neighbourhoodPorta                           -23.4399    32.9009  -0.712
## neighbourhoodProvençals del Poblenou         -19.1179    30.9081  -0.619
## neighbourhoodSant Andreu                     -21.4528    29.9403  -0.717
## neighbourhoodSant Andreu de Palomar          -20.5631    30.8933  -0.666
## neighbourhoodSant Antoni                     -21.1097    29.9761  -0.704
## neighbourhoodSant Genís dels Agudells        -32.7547    33.7449  -0.971
## neighbourhoodSant Gervasi - Galvany          -27.1806    29.8662  -0.910
## neighbourhoodSant Gervasi - la Bonanova      -50.0196    35.7186  -1.400
## neighbourhoodSant Martí                      -18.7222    29.8821  -0.627
## neighbourhoodSant Martí de Provençals        -17.4224    31.0041  -0.562
## neighbourhoodSant Pere/Santa Caterina        -24.0552    30.0181  -0.801
## neighbourhoodSants-Montjuïc                  -21.4089    29.8016  -0.718
## neighbourhoodSarrià                          -23.5800    29.9708  -0.787
## neighbourhoodSarrià-Sant Gervasi             -26.5323    29.6603  -0.895
## neighbourhoodTrinitat Nova                   -25.6116    41.2008  -0.622
## neighbourhoodTuró de la Peira - Can Peguera  -13.7704    31.8177  -0.433
## neighbourhoodVallcarca i els Penitents       -26.9538    30.6695  -0.879
## neighbourhoodVerdum - Los Roquetes           -24.9577    32.9081  -0.758
## neighbourhoodVila de Gràcia                  -18.4190    29.7581  -0.619
## neighbourhoodVilapicina i la Torre Llobeta   -32.3390    32.8714  -0.984
##                                              Pr(>|t|)   
## (Intercept)                                   0.00158 **
## neighbourhoodCamp d'en Grassot i Gràcia Nova  0.47184   
## neighbourhoodCan Baro                         0.06869 . 
## neighbourhoodCarmel                           0.35421   
## neighbourhoodCiutat Vella                     0.46822   
## neighbourhoodDiagonal Mar - La Mar Bella      0.45392   
## neighbourhoodDreta de l'Eixample              0.42979   
## neighbourhoodEixample                         0.48390   
## neighbourhoodEl Baix Guinardó                 0.45857   
## neighbourhoodEl Besòs i el Maresme            0.37943   
## neighbourhoodEl Bon Pastor                    0.98262   
## neighbourhoodEl Born                          0.68601   
## neighbourhoodEl Camp de l'Arpa del Clot       0.48575   
## neighbourhoodEl Clot                          0.33331   
## neighbourhoodEl Coll                          0.13128   
## neighbourhoodEl Congrés i els Indians         0.65558   
## neighbourhoodel Fort Pienc                    0.56025   
## neighbourhoodEl Gòtic                         0.51342   
## neighbourhoodEl Poble-sec                     0.49647   
## neighbourhoodEl Poblenou                      0.70212   
## neighbourhoodEl Putget i Farró                0.40617   
## neighbourhoodEl Raval                         0.48228   
## neighbourhoodGlòries - El Parc                0.51056   
## neighbourhoodGràcia                           0.53022   
## neighbourhoodGuinardó                         0.47229   
## neighbourhoodHorta                            0.22241   
## neighbourhoodHorta-Guinardó                   0.38618   
## neighbourhoodL'Antiga Esquerra de l'Eixample  0.41223   
## neighbourhoodLa Barceloneta                   0.58660   
## neighbourhoodLa Font d'en Fargues             0.42036   
## neighbourhoodLa Maternitat i Sant Ramon       0.47412   
## neighbourhoodLa Nova Esquerra de l'Eixample   0.43646   
## neighbourhoodLa Prosperitat                   0.05207 . 
## neighbourhoodLa Sagrada Família               0.45813   
## neighbourhoodLa Sagrera                       0.44503   
## neighbourhoodLa Salut                         0.81267   
## neighbourhoodLa Teixonera                     0.42098   
## neighbourhoodLa Trinitat Vella                0.33222   
## neighbourhoodLa Verneda i La Pau              0.36467   
## neighbourhoodLa Vila Olímpica                 0.48551   
## neighbourhoodLes Corts                        0.46067   
## neighbourhoodLes Tres Torres                  0.31738   
## neighbourhoodMontbau                          0.56659   
## neighbourhoodNavas                            0.56154   
## neighbourhoodNou Barris                       0.32678   
## neighbourhoodPedralbes                        0.40687   
## neighbourhoodPorta                            0.47623   
## neighbourhoodProvençals del Poblenou          0.53625   
## neighbourhoodSant Andreu                      0.47371   
## neighbourhoodSant Andreu de Palomar           0.50569   
## neighbourhoodSant Antoni                      0.48133   
## neighbourhoodSant Genís dels Agudells         0.33177   
## neighbourhoodSant Gervasi - Galvany           0.36283   
## neighbourhoodSant Gervasi - la Bonanova       0.16146   
## neighbourhoodSant Martí                       0.53099   
## neighbourhoodSant Martí de Provençals         0.57418   
## neighbourhoodSant Pere/Santa Caterina         0.42296   
## neighbourhoodSants-Montjuïc                   0.47256   
## neighbourhoodSarrià                           0.43146   
## neighbourhoodSarrià-Sant Gervasi              0.37108   
## neighbourhoodTrinitat Nova                    0.53422   
## neighbourhoodTuró de la Peira - Can Peguera   0.66519   
## neighbourhoodVallcarca i els Penitents        0.37953   
## neighbourhoodVerdum - Los Roquetes            0.44825   
## neighbourhoodVila de Gràcia                   0.53597   
## neighbourhoodVilapicina i la Torre Llobeta    0.32526   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                           edf Ref.df      F p-value    
## s(latitude)             1.019  1.038  0.178 0.69673    
## s(longitude)            2.902  3.807  1.155 0.30559    
## s(bathrooms)            3.571  4.461  4.499 0.00101 ** 
## s(bedrooms)             3.252  4.173  4.445 0.00146 ** 
## s(accommodates)         4.284  5.237  9.958 < 2e-16 ***
## s(beds)                 3.416  4.264  1.596 0.14195    
## s(price)                6.027  7.118 28.644 < 2e-16 ***
## s(minimum_nights)       5.898  6.840  6.892 < 2e-16 ***
## s(review_scores_rating) 3.332  4.055 24.927 < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0904   Deviance explained = 10.9%
## -REML =  23250  Scale est. = 817.01    n = 4913

5.4.5 Gam Model Prediction of Occupancy Rate

## R-squared on Test Set (Occupancy 30):  0.04477809

5.4.6 GAM Model: Evaluating Predictive Occupancy Rate Performance

## MAE:  64.46847
## RMSE:  66.04594

5.5 Neural Network

5.5.1 Neural Network Model Training for Price Prediction

A Neural Network model is performed using the training data.

##                     Length Class      Mode    
## call                    5  -none-     call    
## response             4913  -none-     numeric 
## covariate           39304  -none-     numeric 
## model.list              2  -none-     list    
## err.fct                 1  -none-     function
## act.fct                 1  -none-     function
## linear.output           1  -none-     logical 
## data                    9  data.frame list    
## exclude                 0  -none-     NULL    
## net.result              1  -none-     list    
## weights                 1  -none-     list    
## generalized.weights     1  -none-     list    
## startweights            1  -none-     list    
## result.matrix          34  -none-     numeric

5.5.2 Neural Network Model Prediction and Pricing Evaluation

5.5.3 Neural Network Model: Evaluating Predictive Pricing Performance

## MAE:  48.05641
## RMSE:  90.50193
## R-squared:  0.2852029

5.5.4 Neural Network Model Training for Occupancy Rate Prediction

5.6 Support Vector Machine

It is used to analyse …. and answer the project question x - Accurancy - Precision - Recall - RMSE (Root Mean Squared Error) - MAE (Mean Absolute Error) - R Squared

5.7 Models comparison and evaluation (after models completed)

  • Accurancy
  • Precision
  • Recall
  • RMSE (Root Mean Squared Error)
  • MAE (Mean Absolute Error)
  • R Squared

6. Use of Generative AI (notes from everyone to collect and formulate)

– how you used generative AI in redacting the group work (code-related questions, generate text, explain concepts…) – what was easy/hard/impossible to do with generative AI – what you had to pay attention to/be critical about when using the results obtained through the use of generative AI

7. Conclusions (together at the end)

  • Summary of key insights
  • Predictive Model Performance (can we answer our research questions?)
  • Implications
  • Limitations
  • Future work

8. Appendix (CR)

Table with the description of every variable and the type

9. References

  • website
  • literature